asked    Giles     2018-10-22       python       81 view        2 Answers

[SOLVED] Pandas - expand dataframe using daterange

I have the following dataframe:

 name    from       amount   days
 A       7/31/18    200      1
 B       7/31/18    300      1
 C       7/30/18    200      1
 D       7/27/18    100      3
 ......
 G       7/17/18    50       1
 H       7/13/18    150      4

I'd like to expand it to this, where days does not equal 1:

 name    from       amount   days
 A       7/31/18    200      1
 B       7/31/18    300      1
 C       7/30/18    200      1
 D       7/29/18    100      3
 D       7/28/18    100      3
 D       7/27/18    100      3
 ......
 G       7/17/18    50       1
 H       7/16/18    150      4
 H       7/15/18    150      4
 H       7/14/18    150      4
 H       7/13/18    150      4

If possible, I'd also like to add a column that can distinguish between the original data and expanded data (since I'm going to need to filter some dates eventually):

 name    from       amount   days   original
 A       7/31/18    200      1      1
 B       7/31/18    300      1      1
 C       7/30/18    200      1      1
 D       7/29/18    100      3      0
 D       7/28/18    100      3      0
 D       7/27/18    100      3      1
 ......
 G       7/17/18    50       1      1
 H       7/16/18    150      4      0
 H       7/15/18    150      4      0
 H       7/14/18    150      4      0
 H       7/13/18    150      4      1

Edit: To clarify the expansion: Days will tell you how many rows it needs to be expanded to. Alternatively, you can use the date above the original value as a boundary (the entry 7/27 where days=3 will stop at the above value where the date is 7/30. The data has constraints to make sure it never overlaps).

  2 Answers  

        answered    Edith     2018-10-22      

About two steps create the dataframe(reindex) , and adjust the values (duplicated)

newdf=df.reindex(df.index.repeat(df.days)) # create the df using reindex
adddate=pd.Series(np.concatenate(df.days.apply(np.arange).values),index=newdf.index)# create the timedelta to add 
newdf['from']=pd.to_datetime(newdf['from'])+pd.to_timedelta(adddate,unit='d')# assign the value 
newdf['original']=(~newdf.index.duplicated()).astype(int)
newdf
Out[240]: 
  name       from  amount  days  original
0    A 2018-07-31     200     1         1
1    B 2018-07-31     300     1         1
2    C 2018-07-30     200     1         1
3    D 2018-07-27     100     3         1
3    D 2018-07-28     100     3         0
3    D 2018-07-29     100     3         0


        answered    Denise     2018-10-22      

Comprehension

df['from'] = pd.to_datetime(df['from'])

pd.DataFrame([
    (n, f, a, d, int(f == F))
    for n, F, a, d in zip(*map(df.get, df))
    for f in pd.date_range(F, periods=d)[::-1]
], columns=[*df.columns] + ['original'])

   name       from  amount  days  original
0     A 2018-07-31     200     1         1
1     B 2018-07-31     300     1         1
2     C 2018-07-30     200     1         1
3     D 2018-07-29     100     3         0
4     D 2018-07-28     100     3         0
5     D 2018-07-27     100     3         1
6     G 2018-07-17      50     1         1
7     H 2018-07-16     150     4         0
8     H 2018-07-15     150     4         0
9     H 2018-07-14     150     4         0
10    H 2018-07-13     150     4         1

Helper Functions

I edited my answer to use duplicated instead of cum_count. I got the idea from @Wen's post

def f(x):
  return pd.date_range(
      pd.to_datetime(x).min(),
      periods=len(x)
  ).sort_values(ascending=False)

def g(d):
  return d.groupby('name')['from'].transform(f)

def h(d):
  return 1 - d.name.duplicated(keep='last')

df.loc[df.index.repeat(df.days)].assign(**{'from': g, 'original': h})

  name       from  amount  days  original
0    A 2018-07-31     200     1         1
1    B 2018-07-31     300     1         1
2    C 2018-07-30     200     1         1
3    D 2018-07-29     100     3         0
3    D 2018-07-28     100     3         0
3    D 2018-07-27     100     3         1
4    G 2018-07-17      50     1         1
5    H 2018-07-16     150     4         0
5    H 2018-07-15     150     4         0
5    H 2018-07-14     150     4         0
5    H 2018-07-13     150     4         1




Your Answer





 2018-10-22         Jill

How to `update_all` based on a subquery in Rails

I'm trying to achieve something pretty simple in PostgreSQL the Rails way.Say you have a User model with 3 columns id, cached_segment, cached_step.Say you already have a complex query that calculates segment and query on the fly, encapsulated in a scope User.decorate_with_segment_and_step. This returns an ActiveRecord relation, same as User, but with 3 additional columns:id cached_segment cached_step segment step cache_invalid1 NULL NULL segment_1 step_1 TRUE2 segment_1 step_2 segment_1 step_2 FALSE3 ...The SQL I would like to gen...
 ruby-on-rails                     1 answers                     63 view
 2018-10-22         Emmanuel

GTest: fixture required when TYPE_TESTING global functions?

I want to unit-test some global templated functions using TYPED_TEST. The following code works, I just wonder whether there is way I can get rid of the test fixture, as it does not seem to be needed..#include <gtest/gtest.h>#include <base/mathfunctions.h>template <class T>class MinTest : public testing::Test {};// The list of types we want to test.typedef ::testing::Types<int, float> Implementations;TYPED_TEST_CASE(MinTest, Implementations);TYPED_TEST(MinTest, ReturnsMinimumValue) { EXPECT_EQ(Base::Min<TypeParam>(-5, 5), -5);} ... as it ...
 unit-testing                     1 answers                     61 view
 2018-10-22         Chester

How do I inject build version into pom.xml using TFS on UI?

I have a TFS build definition that deals with a java spring backend application that is built with Maven. I have a build step called "Maven pom.xml" where I obviously can set some options of the pom.xml. I need to change the "version" tag in the pom.xml using TFS whilst building, so that the "info" endpoint in actuator serves the correct version number including the TFS build number.1) Is this possible using the TFS UI at all? 2) If yes, should I change "Options" textbox content or "MAVEN_OPTS" and how do I feed parameters there, that change <version> in the pom.xml? ...
 java                     2 answers                     63 view